Comparative studies of alignment, alignment-free and SVM based approaches for predicting the hosts of viruses based on viral sequences
Identifieur interne : 000A37 ( Main/Exploration ); précédent : 000A36; suivant : 000A38Comparative studies of alignment, alignment-free and SVM based approaches for predicting the hosts of viruses based on viral sequences
Auteurs : Han Li [États-Unis] ; Fengzhu Sun [États-Unis, République populaire de Chine]Source :
- Scientific Reports [ 2045-2322 ] ; 2018.
Descripteurs français
- KwdFr :
- ADN viral (analyse), Alignement de séquences (), Analyse de séquence d'ADN (), Coronavirus (génétique), Coronavirus du syndrome respiratoire du Moyen-Orient (génétique), Glycoprotéine de spicule des coronavirus (génétique), Génome viral (génétique), Machine à vecteur de support, Modèles théoriques, Pandémies, Phylogénie, Virus de la grippe A (génétique), Virus de la rage (génétique).
- MESH :
- analyse : ADN viral.
- génétique : Coronavirus, Coronavirus du syndrome respiratoire du Moyen-Orient, Glycoprotéine de spicule des coronavirus, Génome viral, Virus de la grippe A, Virus de la rage.
- Alignement de séquences, Analyse de séquence d'ADN, Machine à vecteur de support, Modèles théoriques, Pandémies, Phylogénie.
English descriptors
- KwdEn :
- Coronavirus (genetics), DNA, Viral (analysis), Genome, Viral (genetics), Host Microbial Interactions (genetics), Host Microbial Interactions (physiology), Influenza A virus (genetics), Middle East Respiratory Syndrome Coronavirus (genetics), Models, Theoretical, Pandemics, Phylogeny, Rabies virus (genetics), Sequence Alignment (methods), Sequence Analysis, DNA (methods), Spike Glycoprotein, Coronavirus (genetics), Support Vector Machine.
- MESH :
- chemical , analysis : DNA, Viral.
- genetics : Coronavirus, Genome, Viral, Host Microbial Interactions, Influenza A virus, Middle East Respiratory Syndrome Coronavirus, Rabies virus, Spike Glycoprotein, Coronavirus.
- methods : Sequence Alignment, Sequence Analysis, DNA.
- physiology : Host Microbial Interactions.
- Models, Theoretical, Pandemics, Phylogeny, Support Vector Machine.
Abstract
Predicting the hosts of newly discovered viruses is important for pandemic surveillance of infectious diseases. We investigated the use of alignment-based and alignment-free methods and support vector machine using mononucleotide frequency and dinucleotide bias to predict the hosts of viruses, and applied these approaches to three datasets: rabies virus, coronavirus, and influenza A virus. For coronavirus, we used the spike gene sequences, while for rabies and influenza A viruses, we used the more conserved nucleoprotein gene sequences. We compared the three methods under different scenarios and showed that their performances are highly correlated with the variability of sequences and sample size. For conserved genes like the nucleoprotein gene, longer
Url:
DOI: 10.1038/s41598-018-28308-x
PubMed: 29968780
PubMed Central: 6030160
Affiliations:
Links toward previous steps (curation, corpus...)
- to stream Pmc, to step Corpus: 000451
- to stream Pmc, to step Curation: 000451
- to stream Pmc, to step Checkpoint: 000620
- to stream PubMed, to step Corpus: 000848
- to stream PubMed, to step Curation: 000848
- to stream PubMed, to step Checkpoint: 000984
- to stream Ncbi, to step Merge: 001E87
- to stream Ncbi, to step Curation: 001E87
- to stream Ncbi, to step Checkpoint: 001E87
- to stream Main, to step Merge: 000A40
- to stream Main, to step Curation: 000A37
Le document en format XML
<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en">Comparative studies of alignment, alignment-free and SVM based approaches for predicting the hosts of viruses based on viral sequences</title>
<author><name sortKey="Li, Han" sort="Li, Han" uniqKey="Li H" first="Han" last="Li">Han Li</name>
<affiliation wicri:level="2"><nlm:aff id="Aff1"><institution-wrap><institution-id institution-id-type="ISNI">0000 0001 2156 6853</institution-id>
<institution-id institution-id-type="GRID">grid.42505.36</institution-id>
<institution>Molecular and Computational Biology Program, Department of Biological Sciences,</institution>
<institution>University of Southern California,</institution>
</institution-wrap>
Los Angeles, CA 90089 USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<placeName><region type="state">Californie</region>
</placeName>
<wicri:cityArea>Los Angeles</wicri:cityArea>
</affiliation>
</author>
<author><name sortKey="Sun, Fengzhu" sort="Sun, Fengzhu" uniqKey="Sun F" first="Fengzhu" last="Sun">Fengzhu Sun</name>
<affiliation wicri:level="2"><nlm:aff id="Aff1"><institution-wrap><institution-id institution-id-type="ISNI">0000 0001 2156 6853</institution-id>
<institution-id institution-id-type="GRID">grid.42505.36</institution-id>
<institution>Molecular and Computational Biology Program, Department of Biological Sciences,</institution>
<institution>University of Southern California,</institution>
</institution-wrap>
Los Angeles, CA 90089 USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<placeName><region type="state">Californie</region>
</placeName>
<wicri:cityArea>Los Angeles</wicri:cityArea>
</affiliation>
<affiliation wicri:level="1"><nlm:aff id="Aff2"><institution-wrap><institution-id institution-id-type="ISNI">0000 0001 0125 2443</institution-id>
<institution-id institution-id-type="GRID">grid.8547.e</institution-id>
<institution>Centre for Computational Systems Biology, School of Mathematical Sciences,</institution>
<institution>Fudan University,</institution>
</institution-wrap>
Shanghai, 200433 China</nlm:aff>
<country xml:lang="fr">République populaire de Chine</country>
<wicri:regionArea>Shanghai</wicri:regionArea>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">PMC</idno>
<idno type="pmid">29968780</idno>
<idno type="pmc">6030160</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6030160</idno>
<idno type="RBID">PMC:6030160</idno>
<idno type="doi">10.1038/s41598-018-28308-x</idno>
<date when="2018">2018</date>
<idno type="wicri:Area/Pmc/Corpus">000451</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Corpus" wicri:corpus="PMC">000451</idno>
<idno type="wicri:Area/Pmc/Curation">000451</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Curation">000451</idno>
<idno type="wicri:Area/Pmc/Checkpoint">000620</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Checkpoint">000620</idno>
<idno type="wicri:source">PubMed</idno>
<idno type="RBID">pubmed:29968780</idno>
<idno type="wicri:Area/PubMed/Corpus">000848</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Corpus" wicri:corpus="PubMed">000848</idno>
<idno type="wicri:Area/PubMed/Curation">000848</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Curation">000848</idno>
<idno type="wicri:Area/PubMed/Checkpoint">000984</idno>
<idno type="wicri:explorRef" wicri:stream="Checkpoint" wicri:step="PubMed">000984</idno>
<idno type="wicri:Area/Ncbi/Merge">001E87</idno>
<idno type="wicri:Area/Ncbi/Curation">001E87</idno>
<idno type="wicri:Area/Ncbi/Checkpoint">001E87</idno>
<idno type="wicri:Area/Main/Merge">000A40</idno>
<idno type="wicri:Area/Main/Curation">000A37</idno>
<idno type="wicri:Area/Main/Exploration">000A37</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en" level="a" type="main">Comparative studies of alignment, alignment-free and SVM based approaches for predicting the hosts of viruses based on viral sequences</title>
<author><name sortKey="Li, Han" sort="Li, Han" uniqKey="Li H" first="Han" last="Li">Han Li</name>
<affiliation wicri:level="2"><nlm:aff id="Aff1"><institution-wrap><institution-id institution-id-type="ISNI">0000 0001 2156 6853</institution-id>
<institution-id institution-id-type="GRID">grid.42505.36</institution-id>
<institution>Molecular and Computational Biology Program, Department of Biological Sciences,</institution>
<institution>University of Southern California,</institution>
</institution-wrap>
Los Angeles, CA 90089 USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<placeName><region type="state">Californie</region>
</placeName>
<wicri:cityArea>Los Angeles</wicri:cityArea>
</affiliation>
</author>
<author><name sortKey="Sun, Fengzhu" sort="Sun, Fengzhu" uniqKey="Sun F" first="Fengzhu" last="Sun">Fengzhu Sun</name>
<affiliation wicri:level="2"><nlm:aff id="Aff1"><institution-wrap><institution-id institution-id-type="ISNI">0000 0001 2156 6853</institution-id>
<institution-id institution-id-type="GRID">grid.42505.36</institution-id>
<institution>Molecular and Computational Biology Program, Department of Biological Sciences,</institution>
<institution>University of Southern California,</institution>
</institution-wrap>
Los Angeles, CA 90089 USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<placeName><region type="state">Californie</region>
</placeName>
<wicri:cityArea>Los Angeles</wicri:cityArea>
</affiliation>
<affiliation wicri:level="1"><nlm:aff id="Aff2"><institution-wrap><institution-id institution-id-type="ISNI">0000 0001 0125 2443</institution-id>
<institution-id institution-id-type="GRID">grid.8547.e</institution-id>
<institution>Centre for Computational Systems Biology, School of Mathematical Sciences,</institution>
<institution>Fudan University,</institution>
</institution-wrap>
Shanghai, 200433 China</nlm:aff>
<country xml:lang="fr">République populaire de Chine</country>
<wicri:regionArea>Shanghai</wicri:regionArea>
</affiliation>
</author>
</analytic>
<series><title level="j">Scientific Reports</title>
<idno type="eISSN">2045-2322</idno>
<imprint><date when="2018">2018</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc><textClass><keywords scheme="KwdEn" xml:lang="en"><term>Coronavirus (genetics)</term>
<term>DNA, Viral (analysis)</term>
<term>Genome, Viral (genetics)</term>
<term>Host Microbial Interactions (genetics)</term>
<term>Host Microbial Interactions (physiology)</term>
<term>Influenza A virus (genetics)</term>
<term>Middle East Respiratory Syndrome Coronavirus (genetics)</term>
<term>Models, Theoretical</term>
<term>Pandemics</term>
<term>Phylogeny</term>
<term>Rabies virus (genetics)</term>
<term>Sequence Alignment (methods)</term>
<term>Sequence Analysis, DNA (methods)</term>
<term>Spike Glycoprotein, Coronavirus (genetics)</term>
<term>Support Vector Machine</term>
</keywords>
<keywords scheme="KwdFr" xml:lang="fr"><term>ADN viral (analyse)</term>
<term>Alignement de séquences ()</term>
<term>Analyse de séquence d'ADN ()</term>
<term>Coronavirus (génétique)</term>
<term>Coronavirus du syndrome respiratoire du Moyen-Orient (génétique)</term>
<term>Glycoprotéine de spicule des coronavirus (génétique)</term>
<term>Génome viral (génétique)</term>
<term>Machine à vecteur de support</term>
<term>Modèles théoriques</term>
<term>Pandémies</term>
<term>Phylogénie</term>
<term>Virus de la grippe A (génétique)</term>
<term>Virus de la rage (génétique)</term>
</keywords>
<keywords scheme="MESH" type="chemical" qualifier="analysis" xml:lang="en"><term>DNA, Viral</term>
</keywords>
<keywords scheme="MESH" qualifier="analyse" xml:lang="fr"><term>ADN viral</term>
</keywords>
<keywords scheme="MESH" qualifier="genetics" xml:lang="en"><term>Coronavirus</term>
<term>Genome, Viral</term>
<term>Host Microbial Interactions</term>
<term>Influenza A virus</term>
<term>Middle East Respiratory Syndrome Coronavirus</term>
<term>Rabies virus</term>
<term>Spike Glycoprotein, Coronavirus</term>
</keywords>
<keywords scheme="MESH" qualifier="génétique" xml:lang="fr"><term>Coronavirus</term>
<term>Coronavirus du syndrome respiratoire du Moyen-Orient</term>
<term>Glycoprotéine de spicule des coronavirus</term>
<term>Génome viral</term>
<term>Virus de la grippe A</term>
<term>Virus de la rage</term>
</keywords>
<keywords scheme="MESH" qualifier="methods" xml:lang="en"><term>Sequence Alignment</term>
<term>Sequence Analysis, DNA</term>
</keywords>
<keywords scheme="MESH" qualifier="physiology" xml:lang="en"><term>Host Microbial Interactions</term>
</keywords>
<keywords scheme="MESH" xml:lang="en"><term>Models, Theoretical</term>
<term>Pandemics</term>
<term>Phylogeny</term>
<term>Support Vector Machine</term>
</keywords>
<keywords scheme="MESH" xml:lang="fr"><term>Alignement de séquences</term>
<term>Analyse de séquence d'ADN</term>
<term>Machine à vecteur de support</term>
<term>Modèles théoriques</term>
<term>Pandémies</term>
<term>Phylogénie</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en"><p id="Par1">Predicting the hosts of newly discovered viruses is important for pandemic surveillance of infectious diseases. We investigated the use of alignment-based and alignment-free methods and support vector machine using mononucleotide frequency and dinucleotide bias to predict the hosts of viruses, and applied these approaches to three datasets: rabies virus, coronavirus, and influenza A virus. For coronavirus, we used the spike gene sequences, while for rabies and influenza A viruses, we used the more conserved nucleoprotein gene sequences. We compared the three methods under different scenarios and showed that their performances are highly correlated with the variability of sequences and sample size. For conserved genes like the nucleoprotein gene, longer <italic>k</italic>
-mers than mono- and dinucleotides are needed to better distinguish the sequences. We also showed that both alignment-based and alignment-free methods can accurately predict the hosts of viruses. When alignment is difficult to achieve or highly time-consuming, alignment-free methods can be a promising substitute to predict the hosts of new viruses.</p>
</div>
</front>
<back><div1 type="bibliography"><listBibl><biblStruct><analytic><author><name sortKey="Chan, Jfw" uniqKey="Chan J">JFW Chan</name>
</author>
<author><name sortKey="To, Kkw" uniqKey="To K">KKW To</name>
</author>
<author><name sortKey="Chen, H" uniqKey="Chen H">H Chen</name>
</author>
<author><name sortKey="Yuen, Ky" uniqKey="Yuen K">KY Yuen</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Longdon, B" uniqKey="Longdon B">B Longdon</name>
</author>
<author><name sortKey="Brockhurst, Ma" uniqKey="Brockhurst M">MA Brockhurst</name>
</author>
<author><name sortKey="Russell, Ca" uniqKey="Russell C">CA Russell</name>
</author>
<author><name sortKey="Welch, Jj" uniqKey="Welch J">JJ Welch</name>
</author>
<author><name sortKey="Jiggins, Fm" uniqKey="Jiggins F">FM Jiggins</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Lau, Sk" uniqKey="Lau S">SK Lau</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Li, W" uniqKey="Li W">W Li</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Streicker, Dg" uniqKey="Streicker D">DG Streicker</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Smith, Tf" uniqKey="Smith T">TF Smith</name>
</author>
<author><name sortKey="Waterman, Ms" uniqKey="Waterman M">MS Waterman</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Altschul, Sf" uniqKey="Altschul S">SF Altschul</name>
</author>
<author><name sortKey="Gish, W" uniqKey="Gish W">W Gish</name>
</author>
<author><name sortKey="Miller, W" uniqKey="Miller W">W Miller</name>
</author>
<author><name sortKey="Myers, Ew" uniqKey="Myers E">EW Myers</name>
</author>
<author><name sortKey="Lipman, Dj" uniqKey="Lipman D">DJ Lipman</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Kapoor, A" uniqKey="Kapoor A">A Kapoor</name>
</author>
<author><name sortKey="Simmonds, P" uniqKey="Simmonds P">P Simmonds</name>
</author>
<author><name sortKey="Lipkin, W" uniqKey="Lipkin W">W Lipkin</name>
</author>
<author><name sortKey="Zaidi, S" uniqKey="Zaidi S">S Zaidi</name>
</author>
<author><name sortKey="Delwart, E" uniqKey="Delwart E">E Delwart</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Aguas, R" uniqKey="Aguas R">R Aguas</name>
</author>
<author><name sortKey="Ferguson, Nm" uniqKey="Ferguson N">NM Ferguson</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct><analytic><author><name sortKey="Kargarfard, F" uniqKey="Kargarfard F">F Kargarfard</name>
</author>
<author><name sortKey="Sami, A" uniqKey="Sami A">A Sami</name>
</author>
<author><name sortKey="Mohammadi Dehcheshmeh, M" uniqKey="Mohammadi Dehcheshmeh M">M Mohammadi-Dehcheshmeh</name>
</author>
<author><name sortKey="Ebrahimie, E" uniqKey="Ebrahimie E">E Ebrahimie</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Wan, L" uniqKey="Wan L">L Wan</name>
</author>
<author><name sortKey="Reinert, G" uniqKey="Reinert G">G Reinert</name>
</author>
<author><name sortKey="Sun, F" uniqKey="Sun F">F Sun</name>
</author>
<author><name sortKey="Waterman, Ms" uniqKey="Waterman M">MS Waterman</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Reinert, G" uniqKey="Reinert G">G Reinert</name>
</author>
<author><name sortKey="Chew, D" uniqKey="Chew D">D Chew</name>
</author>
<author><name sortKey="Sun, F" uniqKey="Sun F">F Sun</name>
</author>
<author><name sortKey="Waterman, Ms" uniqKey="Waterman M">MS Waterman</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Ren, J" uniqKey="Ren J">J Ren</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Lu, Yy" uniqKey="Lu Y">YY Lu</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Zhang, Cy" uniqKey="Zhang C">CY Zhang</name>
</author>
<author><name sortKey="Wei, Jf" uniqKey="Wei J">JF Wei</name>
</author>
<author><name sortKey="He, Sh" uniqKey="He S">SH He</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Holmes, Ec" uniqKey="Holmes E">EC Holmes</name>
</author>
<author><name sortKey="Woelk, Ch" uniqKey="Woelk C">CH Woelk</name>
</author>
<author><name sortKey="Kassis, R" uniqKey="Kassis R">R Kassis</name>
</author>
<author><name sortKey="Bourhy, H" uniqKey="Bourhy H">H Bourhy</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Gorman, Ot" uniqKey="Gorman O">OT Gorman</name>
</author>
<author><name sortKey="Bean, Wj" uniqKey="Bean W">WJ Bean</name>
</author>
<author><name sortKey="Kawaoka, Y" uniqKey="Kawaoka Y">Y Kawaoka</name>
</author>
<author><name sortKey="Webster, Rg" uniqKey="Webster R">RG Webster</name>
</author>
</analytic>
</biblStruct>
<biblStruct><analytic><author><name sortKey="Zhang, Y" uniqKey="Zhang Y">Y Zhang</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct><analytic><author><name sortKey="Qi, J" uniqKey="Qi J">J Qi</name>
</author>
<author><name sortKey="Luo, H" uniqKey="Luo H">H Luo</name>
</author>
<author><name sortKey="Hao, B" uniqKey="Hao B">B Hao</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<affiliations><list><country><li>République populaire de Chine</li>
<li>États-Unis</li>
</country>
<region><li>Californie</li>
</region>
</list>
<tree><country name="États-Unis"><region name="Californie"><name sortKey="Li, Han" sort="Li, Han" uniqKey="Li H" first="Han" last="Li">Han Li</name>
</region>
<name sortKey="Sun, Fengzhu" sort="Sun, Fengzhu" uniqKey="Sun F" first="Fengzhu" last="Sun">Fengzhu Sun</name>
</country>
<country name="République populaire de Chine"><noRegion><name sortKey="Sun, Fengzhu" sort="Sun, Fengzhu" uniqKey="Sun F" first="Fengzhu" last="Sun">Fengzhu Sun</name>
</noRegion>
</country>
</tree>
</affiliations>
</record>
Pour manipuler ce document sous Unix (Dilib)
EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000A37 | SxmlIndent | more
Ou
HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000A37 | SxmlIndent | more
Pour mettre un lien sur cette page dans le réseau Wicri
{{Explor lien |wiki= Sante |area= MersV1 |flux= Main |étape= Exploration |type= RBID |clé= PMC:6030160 |texte= Comparative studies of alignment, alignment-free and SVM based approaches for predicting the hosts of viruses based on viral sequences }}
Pour générer des pages wiki
HfdIndexSelect -h $EXPLOR_AREA/Data/Main/Exploration/RBID.i -Sk "pubmed:29968780" \ | HfdSelect -Kh $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd \ | NlmPubMed2Wicri -a MersV1
This area was generated with Dilib version V0.6.33. |